Security News
Input Validation Vulnerabilities Dominate MITRE's 2024 CWE Top 25 List
MITRE's 2024 CWE Top 25 highlights critical software vulnerabilities like XSS, SQL Injection, and CSRF, reflecting shifts due to a refined ranking methodology.
HTTP query protocol with proof-of-concept implementations obtaining subsets of remote HTML data via XPath or CSS Selectors
Note that this is still in alpha stages, so may be unstable or non-functional; the PHP and add-on, in particular, are not currently functional.
HTTP query protocol with proof-of-concept implementations obtaining subsets of remote HTML data via XPath or CSS Selectors, essentially providing the likes of a native XML database, but without need for any importing of data (the server will simply read your static HTML/XML files on demand and deliver a subset of this data as queried by the user or application).
HTTPQuery is an experimental protocol with the following tools:
A PHP demo server is also planned.
Despite the fact that the ubiquitous files of the web, HTML files, are THEMSELVES databases, there has been a curious lack of ability to query these files without first needing to enter their contents into a database or for a consumer to be forced to download the entire file and then obtain the subset they desire. Even when time has been taken to enter file contents into a database, users are often hamstrung by developer decisions, as they are not usually empowered to run arbitrary queries.
This HTTP Query protocol, with reference Node.js and Firefox client implementation are meant to provide users and developers with a means to overcome these barriers and limitations by letting your users by default query any document that you allow in the manner they wish, and with the default behavior allowing you to keep your data in simple static files, such as arbitrary HTML files or, on the other hand, HTML files shaped in a manner more similar in structure to traditional simpler databases (e.g., an HTML file consisting solely of a single table, hierarchical list, etc.).
Other possible uses may include selective spidering.
Note that as mentioned the protocol syntax as well as tools are still very much experimental and are used at your own risk. Allowing arbitrary XPath or CSS Selector syntax may present some increased risk of DDOS attacks.
The Web IS a database, and it is about time that its data becomes opened--for the humblest content creator to experienced mashup developers.
While the first goal is to allow regular website content creators to have their content available to searches--with HTML/XML being the inevitable document-centric format, JSON support (via JSONPath / RQL?) is also envisaged.
It is also hoped, whether through minor markup changes to schema attachment, intelligent widgets may become more of a norm in exposing sophisticated, offlineable, type-aware and paginated widgets which do not depend on the content creator being themselves a developer for this functionality to be made available to users.
See the todos for more future goals for the project.
Why require headers rather than GET-friendly bookmarkable/shareable request parameters? - I wanted the protocol to be able to overlay any dynamic as well as static system which might already be using its own request parameters. However, I would like to see a non-HTTP web protocol be created to work with these headers.
If I generate my data dynamically (e.g., because I have files too large to be efficiently queried against a static file), how is the protocol still useful? - The query mechanism and API will still be reusable by local apps (or remote ones such as the Firefox add-on if the server is enabled in a manner like the included Node server), code libraries, etc., even if you do not wish to restrict yourself to static files. For example, even though your API might filter the raw data as it is, an HTTPQuery could be allowed to run on top of that filtered data.
Why not use OData? - While OData has pioneered work in this direction, it is hoped that this simple protocol will gain support and allow piecemeal selection of content in a manner reusable by servers and clients with an absolutely bare minimum of effort by content creators (and even implementers).
xpath1
and css3
). The HTTPQuery server MUST NOT require this header when other
HTTPQuery queries are supplied. (The server MAY utilize the client support
header to display minimal content by default since the client user is assumed
to be familiar with his own browser's capabilities in utilizing the protocol
to query only what he needs. The header query-full-request MAY be
submitted (instead or in addition) by the client to counter-act this
assumption to display minimal content. If the client wishes to make the
request for minimal data explicit, it can make a HEAD request.)xpath1
, css3
, and jsonata
)
before specific queries are made and MUST advertise the header when queries
are successfully returned (and SHOULD return the header if there is a
failure). This information MAY be used by clients to inform users of the
query mechanisms available to them for the site.div
element in the XHTML namespace (i.e., within
<div xmlns="http://www.w3.org/1999/xhtml"></div>
). The query-supporting
server of XPath1 or CSS3 queries MUST also support the ability to recognize
an additional client-supplied header, query-format set to the value
json
which will deliver the XML or HTML results in the JSON format
while also recognizing the header query-content-type which will
indicate the content-type of the wrapped fragments (i.e., text/html or an
XML MIME type) as distinct from the regular Content-Type header
which for JSON should be application/json
.json
so as to
deliver the string in JSON format. A query-content-type response header
MAY be provided if set to text/plain
. (Headers may be added in the future
to distinguish whether JSON delivery should concatenate text node results
into a single string or not.)The CSS Selector syntax has been modified to include the following pseudo-classes:
HTTP Query is a much lighter protocol. HTTP Query does hope to eventually support modification as does OData, but in a web-friendly, hierarchical manner such as with https://github.com/kriszyp/put-selector.
(INCOMPLETE)
<table>
export URL (but only enabling downloading within
limits (see limits below); XPath/CSS Selectors (or paginating query
mechanism, etc.) can then be translated back into equivalent SQL.<p n="">
for
an automatic
paragraph range selection interfacequery:
) to
request and reshape foreign sites with user permission
<script></script>
) (and then do for my own JML HTML-as-JSON
content-type)0.7.0
req.jsonData
set by middlewareFAQs
HTTP query protocol with proof-of-concept implementations obtaining subsets of remote HTML data via XPath or CSS Selectors
The npm package httpquery receives a total of 0 weekly downloads. As such, httpquery popularity was classified as not popular.
We found that httpquery demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
MITRE's 2024 CWE Top 25 highlights critical software vulnerabilities like XSS, SQL Injection, and CSRF, reflecting shifts due to a refined ranking methodology.
Security News
In this segment of the Risky Business podcast, Feross Aboukhadijeh and Patrick Gray discuss the challenges of tracking malware discovered in open source softare.
Research
Security News
A threat actor's playbook for exploiting the npm ecosystem was exposed on the dark web, detailing how to build a blockchain-powered botnet.